2019³â Ãß°èÇмú´ëȸ
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
ÆÄÀ̽ãÀ» ÀÌ¿ëÇÑ ´Ù¾çÇÑ Çü½ÄÀÇ À¥ µ¥ÀÌÅÍ Å©·Ñ¸µ ±â¹ý |
¿µ¹®Á¦¸ñ(English Title) |
Crawling Methods for Web Data of Various Formats Using Python |
ÀúÀÚ(Author) |
½Â¸®
À±¼öÁø
¿ì¿µ¿î
Li Seung
Sujin Yun
Young Woon Woo
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 23 NO. 02 PP. 0343 ~ 0346 (2019. 10) |
Çѱ۳»¿ë (Korean Abstract) |
ÀÌ ³í¹®¿¡¼´Â Ä«Æ䳪 ºí·Î±× Çü½ÄÀÇ ´Ù¾çÇÑ À¥ µ¥ÀÌÅ͸¦ ÀÚµ¿À¸·Î ¼öÁýÇϱâ À§ÇÑ °¢Á¾ ±â¹ýµéÀ» Á¦¾ÈÇÏ¿´´Ù. Á¦¾ÈÇÑ ¸ÂÃã½Ä ¼öÁý ±â¹ýµé°ú HTML ½Ç·ºÅ͸¦ È°¿ëÇÒ ¼ö ÀÖ´Â Python ¾ð¾î¿Í Beautiful Soup ¶óÀ̺귯¸®¸¦ ÀÌ¿ëÇÏ¿´À¸¸ç, Ư¼öÇÑ ÇüÅ·Π±¸¼ºµÇ¾î ÀÖ´Â Ä«Æä, ºí·Î±× µî¿¡ °Ô½ÃµÈ ÅؽºÆ® µ¥ÀÌÅ͸¦ ÀÚµ¿À¸·Î ¸ðµÎ ¼öÁýÇÒ ¼ö ÀÖ¾ú´Ù. Á¦¾ÈÇÑ ±â¹ýµéÀ» È°¿ëÇÏ¿© ´Ù¾çÇÑ ÇüÅÂÀÇ ±¸Á¶·Î ÀÌ·ç¾îÁ® ÀÖ´Â °¢Á¾ Ư¼öÇÑ À¥ ÆäÀÌÁöµé¿¡ ´ëÇؼµµ Python À¥ Å©·Ñ¸µ ÇÁ·Î±×·¥¿¡ ÀÇÇØ ÀÚµ¿À¸·Î ´ë·®ÀÇ µ¥ÀÌÅ͸¦ ¼öÁýÇÒ ¼ö ÀÖ¾ú´Ù. À̸¦ ÅëÇØ ´Ù¾çÇÑ ´ëÈ Áö½ÄÀÌ ÇÊ¿äÇÑ Ãªº¿ ±¸ÇöÀ̳ª, ºòµ¥ÀÌÅÍ ºÐ¼® ¿¬±¸¿¡ È°¿ëµÉ ¼ö ÀÖÀ» °ÍÀ¸·Î ¿¹»óÇÑ´Ù. |
¿µ¹®³»¿ë (English Abstract) |
In this paper, we proposed various techniques for automatically collecting various web data in cafe or blog format. We used the Python language and Beautiful Soup library, which can use the proposed custom collection techniques and HTML selector, and could automatically collect all the text data posted in cafes and blogs composed of special forms. By using the proposed technique, a large amount of data could be automatically collected by Python web-crawling program for various web pages with various structures. Through this, it is expected to be used for chatbot implementation that requires diverse conversation knowledge, or big data analysis research. |
Å°¿öµå(Keyword) |
Web-crawling
Python
BeautifulSoup
HTML selector
Big data
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|